Hardware implementation of the pseudo-spectral time-domain method

ABSTRACT

A computer hardware configuration for performing the pseudo-spectral time-domain (PSTD) method on data. The hardware configuration includes a forward fast Fourier transform (FFT) unit that calculates a forward fast Fourier transform (FFT) from the data, and a complex multiplication unit that receives the FFT-processed data and calculates a spatial derivative in the frequency domain from the FFT-processed data. The hardware configuration further includes an inverse fast Fourier transform (IFFT) unit that converts the spatial derivative in the frequency domain from the complex multiplication unit into the time domain, and a computation engine that solves a PSTD equation based upon the spatial derivative in the time domain received from the IFFT unit.

CLAIM FOR PRIORITY

The present application is a U.S. National Stage application filed under35 U.S.C. § 371, claiming priority of International Application No.PCT/US2003/020259, filed Jun. 24, 2003, and U.S. Provisional PatentApplication Ser. No. 60/390,993, filed Jun. 24, 2002, respectively,under 35 U.S.C. §§ 119 and 365, the disclosures of the above-referencedapplications being incorporated by reference herein in their entireties.

BACKGROUND OF THE INVENTION

A. Field of the Invention

The present invention relates generally to hardware accelerators, and,more particularly to a hardware implementation of the pseudo-spectraltime-domain (PSTD) method.

B. Description of the Related Art

Since the advent of the modem computer, a great deal of effort has goneinto the development of numerical algorithms for the rigorous solutionof electromagnetic problems. Today, popular numerical approaches includethe finite-element method (FEM), method of moments (MOM), modalexpansion techniques, boundary integral methods, and time-domain methodssuch as the finite-volume time-domain (FVTD), multi-resolutiontime-domain (MRTD), and the finite-difference time-domain (FDTD)methods. Each of these algorithms possesses clear advantages anddisadvantages depending on the specific application. One algorithm inparticular, the pseudo-spectral time-domain (PSTD) method showsparticular promise. In comparison to the above methods, the PSTDtechnique can require far less memory while maintaining, and in factimproving, the accuracy and versatility of electromagnetic analysis. Tothis end, recent numerical experiments have confirmed that, for a fixedamount of computational resources, the PSTD method can analyze problemstwo to three orders of magnitude larger than an FDTD method with thesame level of accuracy.

However, one of the difficulties of the PSTD method is the large numberof forward and inverse Fast Fourier Transforms (FFTs and IFFTs) thatneed to be computed, which significantly slow down the analysis. Incomparison to the FDTD method, which has order N computationaldependence, the PSTD method has order NlogN. Thus, from a pure softwarepoint of view, the PSTD method is far less appealing from acomputational resources point of view.

Over the last several decades, significant effort has been put intorealizing application-specific integrated circuits (ASICs) forapplication to digital signal processing (DSP). As a result, ASICs arecurrently available that perform the FFT and IFFT operations infractions of a microsecond. Thus, the PSTD method is far more attractiveto implement in hardware than the FDTD method, due to the wealth oftechnology that it can leverage. While the FDTD method can also berealized in hardware, it suffers from the fact that it requiresextraordinary amounts of computer memory. In comparison, the PSTD methodcan analyze problems two to three orders of magnitude larger than otherfull-wave techniques, the FDTD method in particular, with the same levelof accuracy.

Despite the advantages of the FDTD acceleration hardware oversoftware-based implementations, the FDTD method requires a tremendousamount of memory. This can greatly limit the size of the problems thatare capable of being solved. Because the PSTD method requires fewersamples than the FDTD method (on the order of 1000 times fewer), muchlarger problems can be solved with the PSTD method given the same amountof memory.

Although PSTD methods are accurate and well defined, current computersystem technology limits the speed at which we can perform theseoperations. To use the PSTD method to solve a non-trivial problem cantake hours, days, weeks, months, etc. Some problems are even too largeto be effectively solved due to time constraints.

Thus, there is a need in the art to overcome the limitations of therelated art, and to provide for a practical, hardware implementation ofthe PSTD method.

SUMMARY OF THE INVENTION

The present invention solves the problems of the related art byproviding a hardware-based PSTD processor that capitalizes on the largeadvancements in the area of application DSP ASICs. By combining the PSTDalgorithm with modem DSP chips and large scale FPGAs, the PSTD processorwill be capable of solving very large radiation problems in computationtimes short enough to enable iterative design. Given this potential, ahardware implementation of the PSTD method offers the ability to designan extraordinary array of electromagnetic problems that heretofore havebeen impossible.

Further scope of applicability of the present invention will becomeapparent from the detailed description given hereinafter. However, itshould be understood that the detailed description and specificexamples, while indicating preferred embodiments of the invention, aregiven by way of illustration only, since various changes andmodifications within the spirit and scope of the invention will becomeapparent to those skilled in the art from this detailed description. Itis to be understood that both the foregoing general description and thefollowing detailed description are exemplary and explanatory only andare not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWING

The present invention will become more fully understood from thedetailed description given hereinbelow and the accompanying drawingwhich are given by way of illustration only, and thus are not limitativeof the present invention, and wherein:

FIG. 1 is a schematic diagram showing a hardware implementation of thePSTD method in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

The following detailed description of the invention refers to theaccompanying drawings. The same reference numbers in different drawingsidentify the same or similar elements. Also, the following detaileddescription does not limit the invention. Instead, the scope of theinvention is defined by the appended claims and equivalents thereof.

Equations in the PSTD method take the following form: $\begin{matrix}{E_{ab} = {{A\quad E_{ab}} + {B\frac{\partial H_{c}}{\partial b}} + {C\quad E_{ab}^{inc}}}} & (1)\end{matrix}$where a, b, and c are directions (x, y, or z), A, B, and C arecoefficients based on the material properties of the medium, and E^(inc)_(ab) is the incident field associated with the node.

Unlike the finite-difference time-domain (FDTD) method, where bothspatial and temporal derivatives are represented by finite differences,the PSTD method uses FFTs and IFFTs to calculate the spatialderivatives. For example, recall from basic frequency-domaincalculations that the spatial derivative of a field, H_(z), in they-direction can be written as: $\begin{matrix}{\frac{\partial H_{z}}{\partial y} = {{IFFT}( {j^{*}k_{y}^{*}{{FFT}( {H_{z}( {i,{:{,k}}} )} )}} }} & (2)\end{matrix}$

To compute the spatial derivative of H_(z) in the y-direction for somenode (i, j, k), all of the values of H_(z) in the y-direction along theline (i, . . . , k) are required. Although this can require a tremendousamount of data to be fetched (depending on the mesh size), thisoperation only needs to be performed once for all values that requirethis derivative along this line. If there are N values in they-direction, by fetching N values from memory, the spatial derivative inthe y-direction for N different mesh points may be computed. Althoughthere is a significant latency associated with the fetch operation, thisoperation only needs to be performed once per derivative, per timestep.

Despite the large amount of data required to perform the FFT/IFFToperations, it is possible to efficiently organize RAM to maximizethroughput. For example, if the values of H_(z) were stored in RAM inincreasing-y order, it would be possible to perform a burst-read fromRAM. Bursting allows the RAM to fetch contiguous locations very rapidly.Thus, burst reads allow fetching of all of the values necessary tocompute the derivative, while, at the same time, maximizing thethroughput of the RAM. Because spatial derivatives are required in thex, y, and z-directions, values would need to be stored in the RAM inthree different patterns (increasing-x, increasing-y, and increasing-z).This would permit taking advantage of the bursting capabilities whencomputing any required derivative. Similarly, there will be threedifferent “update” orders. The update order specifies the order in whichnodes in the mesh will be updated. Updating nodes in the same order inwhich they are stored in RAM allows reuse of the spatial derivatives.Once the spatial derivative of a field is known in a specific direction,that value can be used to update any node along that line. Thus, bysolving fields along that line, the same value can be used over and overagain without incurring the latency of the RAM fetching and the FFT/IFFToperations.

FIG. 1 shows one logical data flow for a PSTD accelerator 10 inaccordance with the present invention. Note that there are threeparallel datapaths 12, 14, 16. These datapaths 12, 14, 16 are completelyindependent of one another (except for a memory subsystem 18) and eachsolves fields in a given direction (x, y, or z). Each of these datapaths12, 14, 16 is responsible for solving an equation of the form shown inequation 1.

The flow within each computational path begins by determining thespatial derivative needed for the computation. The necessary values mustbe fetched from the memory subsystem 18 and streamed into an FFT unit 20that performs the FFT on the values. Depending on the size of theproblem being solved and the capabilities of the FFT unit 20, thiscomputation can easily require several thousand cycles. While the FFT isbeing computed by the FFT unit 20, the other necessary data, includingthe primary fields, incident fields, and coefficients, can all befetched from memory subsystem 18 or computed. This entire computationalpath will be pipelined to hide much of the latency of the FFT operation.As results begin emerging from the FFT unit 20, the results are streamedthrough a pipelined a complex multiplication unit 22 that solves themultiplication aspects of equation (2) set forth above. At this point,the spatial derivatives will have been computed in the frequency domain.The next step in each datapath 12, 14, 16 is to convert the frequencydomain result back into the time domain by means of an IFFT unit 24 thatperforms an IFFT. IFFT unit 24 will also undergo a severalthousand-cycle latency. As results begin emerging from the IFFT unit 24,the results are streamed into a Computation Engine (CE) 26. The CE 26,given the necessary data and the spatial derivative, solves equation (1)set forth above. Once complete, the fields are written back to thememory subsystem 18 by CE 26.

It will be apparent to those skilled in the art that variousmodifications and variations can be made in the hardware implementationof the PSTD method of the present invention and in construction of thehardware without departing from the scope or spirit of the invention.

For example, although FIG. 1 shows three parallel computationaldatapaths 12, 14, 16, more or less such datapaths may be provided. Fewerdatapaths may be implemented to save on hardware, or extra datapaths maybe included to increase parallelism. The actual number of datapaths ismostly dependent upon the capabilities of the memory subsystem 18.

Also, the FFT and IFFT units 20, 24 may or may not be co-located withcomplex multiplication unit 22 and CE unit 26. FFT and IFFT units 20, 24may be implemented inside of a field-programmable gate-array (FPGA).However, custom FFT/IFFT chips may be purchased or DSP chips may beprogrammed to solve the FFT/IFFT equations. The latter configurationsrequire DSP chips external to the FPGA. This would result in a morecomplex printed circuit board and latency in transferring data into andout of the FPGA. However, DSP chips are extremely fast and would saveFPGA resources. However, with very advanced FPGAs already in the market,an implementation where the FFT/IFFT operations are performed inside theFPGA is quite possible.

By implementing the PSTD method in hardware, computational speedup isachieved that allows solving problems much faster than currentsoftware-based methods and also allows solving problems that wereheretofore unsolvable due to time constraints. The hardwareimplementation of the PSTD method of the present invention may be usedin any field or application utilizing a software method based on apseudo-spectral time-domain approach.

Other embodiments of the invention will be apparent to those skilled inthe art from consideration of the specification and practice of theinvention disclosed herein. It is intended that the specification andexamples be considered as exemplary only, with a true scope and spiritof the invention being indicated by the following claims.

1. A system for performing the pseudo-spectral time-domain (PSTD) methodon data, comprising: a forward fast Fourier transform (FFT) unitcalculating a forward fast Fourier transform (FFT) from the data; acomplex multiplication unit receiving the FFT-processed data andcalculating a spatial derivative in the frequency domain from theFFT-processed data; an inverse fast Fourier transform (IFFT) unitconverting the spatial derivative in the frequency domain from thecomplex multiplication unit into the time domain; and a computationengine solving a PSTD equation based upon the spatial derivative in thetime domain received from the IFFT unit.
 2. A system as recited in claim1, wherein the PSTD equation takes the form:${E_{ab} = {{A\quad E_{ab}} + {B\frac{\partial H_{c}}{\partial b}} + {C\quad E_{ab}^{inc}}}},$where a, b, and c are directions (x, y, and z), A, B, and C arecoefficients based on material properties of a medium, and E^(inc) _(ab)is the incident field associated with the node.
 3. A system as recitedin claim 1, wherein as the FFT is being calculated, primary fields,incident fields, and coefficients are being fetched by the system.
 4. Asystem as recited in claim 1, wherein the FFT and IFFT units areprovided inside a field-programmable gate array (FPGA).
 5. A system asrecited in claim 4, wherein the FFT and IFFT calculations are performedby a digital signal processing (DSP) chip.
 6. A system for performingthe pseudo-spectral time-domain (PSTD) method on data, comprising: aplurality of forward fast Fourier transform (FFT) units, each FFT unitcalculating a forward fast Fourier transform (FFT) from the data; aplurality of complex multiplication units, each complex multiplicationunit receiving the FFT-processed data from a corresponding FFT unit andcalculating a spatial derivative in the frequency domain from theFFT-processed data; a plurality of inverse fast Fourier transform (IFFT)units, each IFFT unit converting the spatial derivative in the frequencydomain from a corresponding complex multiplication unit into the timedomain; and a plurality of computation engines, each computation enginesolving a PSTD equation based upon the spatial derivative in the timedomain received from a corresponding IFFT unit.
 7. A system as recitedin claim 6, wherein the PSTD equation takes the form:${E_{ab} = {{A\quad E_{ab}} + {B\frac{\partial H_{c}}{\partial b}} + {C\quad E_{ab}^{inc}}}},$where a, b, and c are directions (x, y, and z), A, B, and C arecoefficients based on material properties of a medium, and E^(inc) _(ab)is the incident field associated with the node.
 8. A system as recitedin claim 6, wherein as the FFT is being calculated, primary fields,incident fields, and coefficients are being fetched by the system.
 9. Asystem as recited in claim 6, wherein the plurality of FFT and IFFTunits are provided inside a field-programmable gate array (FPGA).
 10. Asystem as recited in claim 9, wherein the FFT and IFFT calculations areperformed by a digital signal processing (DSP) chip.
 11. A computerhardware configuration for performing the pseudo-spectral time-domain(PSTD) method on data, comprising: a forward fast Fourier transform(FFT) unit calculating a forward fast Fourier transform (FFT) from thedata; a complex multiplication unit receiving the FFT-processed data andcalculating a spatial derivative in the frequency domain from theFFT-processed data; an inverse fast Fourier transform (IFFT) unitconverting the spatial derivative in the frequency domain from thecomplex multiplication unit into the time domain; and a computationengine solving a PSTD equation based upon the spatial derivative in thetime domain received from the IFFT unit.
 12. A computer hardwareconfiguration as recited in claim 11, wherein the PSTD equation takesthe form:${E_{ab} = {{A\quad E_{ab}} + {B\frac{\partial H_{c}}{\partial b}} + {C\quad E_{ab}^{inc}}}},$where a, b, and c are directions (x, y, and z), A, B, and C arecoefficients based on material properties of a medium, and E^(inc) _(ab)is the incident field associated with the node.
 13. A computer hardwareconfiguration as recited in claim 11, wherein as the FFT is beingcalculated, primary fields, incident fields, and coefficients are beingfetched by the system.
 14. A computer hardware configuration as recitedin claim 11, wherein the FFT and IFFT units are provided inside afield-programmable gate array (FPGA).
 15. A computer hardwareconfiguration as recited in claim 14, wherein the FFT and IFFTcalculations are performed by a digital signal processing (DSP) chip.16. A computer hardware configuration for performing the pseudo-spectraltime-domain (PSTD) method on data, comprising: a plurality of forwardfast Fourier transform (FFT) units, each FFT unit calculating a forwardfast Fourier transform (FFT) from the data; a plurality of complexmultiplication units, each complex multiplication unit receiving theFFT-processed data from a corresponding FFT unit and calculating aspatial derivative in the frequency domain from the FFT-processed data;a plurality of inverse fast Fourier transform (IFFT) units, each IFFTunit converting the spatial derivative in the frequency domain from acorresponding complex multiplication unit into the time domain; and aplurality of computation engines, each computation engine solving a PSTDequation based upon the spatial derivative in the time domain receivedfrom a corresponding IFFT unit.
 17. A computer hardware configuration asrecited in claim 16, wherein the PSTD equation takes the form:${E_{ab} = {{A\quad E_{ab}} + {B\frac{\partial H_{c}}{\partial b}} + {C\quad E_{ab}^{inc}}}},$where a, b, and c are directions (x, y, and z), A, B, and C arecoefficients based on material properties of a medium, and E^(inc) _(ab)is the incident field associated with the node.
 18. A computer hardwareconfiguration as recited in claim 16, wherein as the FFT is beingcalculated, primary fields, incident fields, and coefficients are beingfetched by the system.
 19. A computer hardware configuration as recitedin claim 16, wherein the plurality of FFT and IFFT units are providedinside a field-programmable gate array (FPGA).
 20. A computer hardwareconfiguration as recited in claim 19, wherein the FFT and IFFTcalculations are performed by a digital signal processing (DSP) chip.